118 research outputs found
Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems
Machine Translation (MT) continues to make significant strides in quality and
is increasingly adopted on a larger scale. Consequently, analyses have been
redirected to more nuanced aspects, intricate phenomena, as well as potential
risks that may arise from the widespread use of MT tools. Along this line, this
paper offers a meticulous assessment of three commercial MT systems - Google
Translate, DeepL, and Modern MT - with a specific focus on gender translation
and bias. For three language pairs (English/Spanish, English/Italian, and
English/French), we scrutinize the behavior of such systems at several levels
of granularity and on a variety of naturally occurring gender phenomena in
translation. Our study takes stock of the current state of online MT tools, by
revealing significant discrepancies in the gender translation of the three
systems, with each system displaying varying degrees of bias despite their
overall translation quality.Comment: Under review at HERMES Journa
Good, but not always Fair: An Evaluation of Gender Bias for three Commercial Machine Translation Systems
Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate, DeepL, and Modern MT - with a specific focus on gender translation and bias. For three language pairs (English-Spanish, English-Italian, and English-French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation. Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality
How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
When translating from notional gender languages (e.g., English) into
grammatical gender languages (e.g., Italian), the generated translation
requires explicit gender assignments for various words, including those
referring to the speaker. When the source sentence does not convey the
speaker's gender, speech translation (ST) models either rely on the
possibly-misleading vocal traits of the speaker or default to the masculine
gender, the most frequent in existing training corpora. To avoid such biased
and not inclusive behaviors, the gender assignment of speaker-related
expressions should be guided by externally-provided metadata about the
speaker's gender. While previous work has shown that the most effective
solution is represented by separate, dedicated gender-specific models, the goal
of this paper is to achieve the same results by integrating the speaker's
gender metadata into a single "multi-gender" neural ST model, easier to
maintain. Our experiments demonstrate that a single multi-gender model
outperforms gender-specialized ones when trained from scratch (with gender
accuracy gains up to 12.9 for feminine forms), while fine-tuning from existing
ST models does not lead to competitive results.Comment: To appear in CLiC-it 202
MAGMATic: A Multi-domain Academic Gold Standard with Manual Annotation of Terminology for Machine Translation Evaluation
This paper presents MAGMATic (Multidomain Academic Gold Standard with Manual Annotation of Terminology), a novel Italian–English benchmark which allows MT evaluation focused on terminology translation. The data set comprises 2,056 parallel sentences extracted from institutional academic texts, namely course unit and degree program descriptions. This text type is particularly interesting since it contains terminology from multiple domains, e.g. education and different academic disciplines described in the texts. All terms in the English target side of the data set were manually identified and annotated with a domain label, for a total of 7,517 annotated terms. Due to their peculiar features, institutional academic texts represent an interesting test bed for MT. As a further contribution of this paper, we investigate the feasibility of exploiting MT for the translation of this type of documents. To this aim, we evaluate two stateof-the-art Neural MT systems on MAGMATic, focusing on their ability to translate domain-specific terminology
Do translator trainees trust machine translation? An experiment on post-editing and revision
Despite the importance of trust in any work environment, this concept has rarely been investigated for MT. The present contribution aims at filling this gap by presenting a post-editing experiment carried out with translator trainees. An institutional academic text was translated from Italian into English. All participants worked on the same target text. Half of them were told that the text was a human translation needing revision, while the other half was told that it was an MT output to be postedited. Temporal and technical effort were measured based on words per second and HTER. Results were complemented with a manual analysis of a subset of the observations
Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES
As part of the WMT-2023 "Test suites" shared task, in this paper we summarize
the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By
focusing on the en-de and de-en language pairs, we rely on these newly created
test suites to investigate systems' ability to translate feminine and masculine
gender and produce gender-inclusive translations. Furthermore we discuss
metrics associated with our test suites and validate them by means of human
evaluations. Our results indicate that systems achieve reasonable and
comparable performance in correctly translating both feminine and masculine
gender forms for naturalistic gender phenomena. Instead, the generation of
inclusive language forms in translation emerges as a challenging task for all
the evaluated MT models, indicating room for future improvements and research
on the topic.Comment: Accepted at WMT 202
On the Dynamics of Gender Learning in Speech Translation
Due to the complexity of bias and the opaque nature of current neural approaches, there is a rising interest in auditing language technologies. In this work, we contribute to such a line of inquiry by exploring the emergence of gender bias in Speech Translation (ST). As a new perspective, rather than focusing on the final systems only, we examine their evolution over the course of training. In this way, we are able to account for different variables related to the learning dynamics of gender translation, and investigate when and how gender divides emerge in ST. Accordingly, for three language pairs (en ? es, fr, it) we compare how ST systems behave for masculine and feminine translation at several levels of granularity. We find that masculine and feminine curves are dissimilar, with the feminine one being characterized by more erratic behaviour and late improvements over the course of training. Also, depending on the considered phenomena, their learning trends can be either antiphase or parallel. Overall, we show how such a progressive analysis can inform on the reliability and time-wise acquisition of gender, which is concealed by static evaluations and standard metrics
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Automatic speech recognition (ASR) systems are known to be sensitive to the
sociolinguistic variability of speech data, in which gender plays a crucial
role. This can result in disparities in recognition accuracy between male and
female speakers, primarily due to the under-representation of the latter group
in the training data. While in the context of hybrid ASR models several
solutions have been proposed, the gender bias issue has not been explicitly
addressed in end-to-end neural architectures. To fill this gap, we propose a
data augmentation technique that manipulates the fundamental frequency (f0) and
formants. This technique reduces the data unbalance among genders by simulating
voices of the under-represented female speakers and increases the variability
within each gender group. Experiments on spontaneous English speech show that
our technique yields a relative WER improvement up to 9.87% for utterances by
female speakers, with larger gains for the least-represented f0 ranges.Comment: Accepted at ASRU 202
Towards a methodology for evaluating automatic subtitling
In response to the increasing interest towards automatic subtitling, this EAMT-funded project aimed at collecting subtitle post-editing data in a real use case scenario where professional subtitlers edit automatically generated subtitles. The post-editing setting includes, for the first time, automatic generation of timestamps and segmentation, and focuses on the effect of timing and segmentation edits on the post-editing process. The collected data will serve as the basis for investigating how subtitlers interact with automatic subtitling and for devising evaluation methods geared to the multimodal nature and formal requirements of subtitling
- …